Introduction to NumPy

Combining arrays

We're seen how NumPy can perform simple operations on arrays by dealing with the elements inside, for example by multiplying every element by 100. It also provides ways to combine together entire arrays, even if they don't exactly match their shape.

As ever, we need to start with importnumpy:

In [1]:
import numpy as np

Matching shape

If you have two arrays which exactly match each other in shape then you can directly combine them. So, if we have one (2, 3) array, grid_a:

In [2]:
grid_a = np.array([[1, 2, 3], [4, 5, 6]])
grid_a
Out[2]:
array([[1, 2, 3],
       [4, 5, 6]])

And another, grid_b which is also (2, 3):

In [3]:
grid_b = np.array([[9, 8, 7], [6, 5, 4]])
grid_b
Out[3]:
array([[9, 8, 7],
       [6, 5, 4]])

Then we can do any numerical or logical operation between the, just as we did with an array and a single number. For example, a multiplication will multiply the values element-by-element ($1\times9=9$, $2\times8=16$, $3\times7=21$ etc.):

In [4]:
grid_a * grid_b
Out[4]:
array([[ 9, 16, 21],
       [24, 25, 24]])

If we define a different array with a different size, for example (3, 2):

In [5]:
grid_c = np.array([[5, 4], [6, 3], [7, 2]])
grid_c
Out[5]:
array([[5, 4],
       [6, 3],
       [7, 2]])

Then the multiplication will not work as the shapes don't exactly match:

In [6]:
grid_a * grid_c
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [6], line 1
----> 1 grid_a * grid_c

ValueError: operands could not be broadcast together with shapes (2,3) (3,2) 

In a way, this makes sense. Exactly what did we expect to be the result of grid_a * grid_c?

We see the term "broadcast" in that error message. We'll get back to that in a minute.

Mismatched dimensions

This doesn't seem very useful, if we can only ever work with exactly matching array shapes. Luckily, there are many situations where NumPy is able to combine arrays, even if they don't match. For example, we can take an array:

In [7]:
a = np.array([6.0, 2.1, 8.2])
In [8]:
a.shape
Out[8]:
(3,)

and multiply it with grid_a:

In [9]:
grid_a * a
Out[9]:
array([[ 6. ,  4.2, 24.6],
       [24. , 10.5, 49.2]])

This works because it's combining an array grid_a with shape (2,3) with another array a with shape (3). It's able to match them together by stretching a so that its dimensions match grid_a:

123
456
×
6.02.18.2
123
456
×
6.02.18.2
6.02.18.2
 6.0 4.224.6
24.010.549.2

Note here we have switched to the row-format of one-dimensional arrays as it makes it easier to understand how they are combined.

This stretching operations is known as "broadcasting" in NumPy. There are a set of rules which govern what shape arrays can be combined with others which is detailed in the official broadcasting documentation.

So, if we try to combine grid_a with an array of shape (2):

In [10]:
b = np.array([10, 10])
grid_a * b
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [10], line 2
      1 b = np.array([10, 10])
----> 2 grid_a * b

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

Then it fails since it was unable to broadcast a (2) to a (2,3).

There are ways to manipulate the arrays to make this work which are all covered in the documentation linked above. You might hope that it would see that it's combining a (2,3) with a (2) and stretch in the other dimension but the rules are designed to be predictable and simple so you can always reason about them without them being too clever and magic.

As a final example, when you do a * 5 NumPy is effectively behind the scenes automatically doing this stretching:

6.02.18.2
×
5
6.02.18.2
×
 5  5  5 
30.010.541.0

Exercise

Once again, grab the "temperature" array. Remember, this is in units of Kelvin and is three-dimensional with axes of altitude, latitude and longitude. The altitude axis is layered such that the 0th layer is ground-level and each layer beyond that increases in altitude.

with np.load("weather_data.npz") as weather:
    temperature = weather["temperature"]

    # masks
    uk_mask = weather["uk"]
    irl_mask = weather["ireland"]
  • Calculate the maximum of the entire 3D temperature data set
  • Multiply the temperature data with the mask to extract only those values from within the UK.
  • Calculate the maximum of the UK data
  • Do the same with Ireland and Spain and compare the numbers

answer